On the VC-Dimension of Univariate Decision Trees
نویسنده
چکیده
In this paper, we give and prove lower bounds of the VC-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. In our previous work (Aslan et al., 2009), we proposed a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively. Using the experimental results of that work, we show that our VC-dimension bounds are tight. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross-validation.
منابع مشابه
Model selection in omnivariate decision trees using Structural Risk Minimization
As opposed to trees that use a single type of decision node, an omnivariate decision tree contains nodes of different types. We propose to use Structural Risk Minimization (SRM) to choose between node types in omnivariate decision tree construction to match the complexity of a node to the complexity of the data reaching that node. In order to apply SRM for model selection, one needs the VC-dime...
متن کاملNear-optimal linear decision trees for k-SUM and related problems
We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, compa...
متن کاملGeneralization Behaviour of Alkemic Decision Trees
This paper is concerned with generalization issues for a decision tree learner for structured data called Alkemy. Motivated by error bounds established in statistical learning theory, we study the VC dimensions of some predicate classes defined on sets and multisets – two data-modelling constructs used intensively in the knowledge representation formalism of Alkemy – and from that obtain insigh...
متن کاملApproximate Faithful Embedding in Learning
In this paper we consider the problem of embedding the input and hypotheses of boolean function classes in other classes, such that the natural metric structure of the two spaces is approximately preserved. We rst prove some general properties of such embedding and then suggest and discuss possible approximate embedding in the class of \half-spaces" (single layer perceptrons) with dimension pol...
متن کاملTheory and Applications of Agnostic PAC-Learning with Small Decision Trees
We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common “real-world” datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power...
متن کامل